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ON  THE  ESTIMATION  OF  PRODUCTION  FRONTIERS: 

Maximum  Likelihood  Estimation  of  the  Parameters  of  a 

Discontinuous  Density  Function 

by 

D.  J.  Aigner,  T.  Amemiya,  and  D.  J.  Poirier* 


This  paper  reports  on  our  attempts  to  construct  a  class  of  estimators 
for  the  classical  linear  regression  model  that  allows  for  different  weights 
to  be  placed  on  positive  and  negative  residuals.  The  motivation  for 
studying  such  a  problem  derives  mainly  from  previous  efforts  to  quantify 
the  notion  of  a  "frontier"  production  function  (Afriat  (1972),  Aigner  and 
Chu  (1968),  Timmer  (1969,  1971)).  Obviously  a  true  frontier  involves  only 
one-sided  residuals.  But  there  are  reasonable  objections  to  the  "full" 
frontier  function  (Timmer  (1971))  that  call  for  some  compromise  between  it 
and  the  usual  "average"  production  function.  Weighting  positive  and 
negative  disturbances  in  a  quadratic  criterion  function  offers  one  way  of 
approaching  this  compromise. 

There  is  at  least  one  other  justification  for  considering  an  asymmetric 
criterion  function  of  the  sort  just  described.   It  lies  in  treating 

i 

asymmetric  consequences  of  under-  or  over- forecasting  in  a  regression  con- 
text. But  as  the  reader  will  note  in  what  follows,  we  concentrate  on 
within-sample  forecasting.  Were  primary  interest  focused  on  asymmetric 
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losses  of  under-  or  over-forecasting  for  a  particular  vector  of  out-of- 
sample  values  for  exogenous  variables,  we  would  approach  the  problem 
in  a  different  manner  (cf.  Poirier  (1973,  Ch.  9)).   In  any  event, 
consideration  of  the  problem  as  posed  allows  for  a  unified  treatment  of 
frontier  estimation,  ordinary  least  squares,  and  intermediate  cases  of 
apparent  empirical  interest. 

!•  The  Statistical  Model 

We  assume  that  a  sample  of  n  independent  observations  are  available, 
having  been  generated  by  the  model 

(1)  v_  =  Xg  +  e, 

where  ^  is  an  n  *  1  vector  of  observations  on  the  dependent  variable,  X 

is  an  n  *  k  matrix  of  observations  on  k  fixed  regressors  (including  a 

column  of  ones),  and  j3  is  the  k  *  1  vector  of  unknown  regression  coeffi- 
cients.  Finally,  each  element  of  the  n  x  1  disturbance  vector  £_  is  deter- 
mined by 

(   *  /  /1-6  if  *  >  0 

(2)  £±  =  S  i=l n 

v  *  /  /e  if  e  <  o 

where  E  ~  N  (0,a  )  for  0  <  G  <  1,  and  E  has  either  the  negative  or 

2 
positive  truncated  normal  distribution  (mean  +  .798a,  variance  .3630  ) 

when  8  »  1  or  0  =  0,  respectively. 

The  density  function  thus  defined  is  discontinuous  at  e  =  0.   Never- 
theless, moments  of  E  exist  and  are  easily  derived.   For  example,  using 
(2)  and  calculating  the  appropriate  partial  moments,  we  find: 


(3)  E(E)  -  -5=  (^g?) 

/2tt      /e  /i-e 


and 


(4) 


V(C)   =  lefe  {1  -    ^"^>2   ),    for  0  <   9  <  1 


Moreover,    a  likelihood   function  can  be  formulated   that   encompasses 
these  underlying  assumptions,   and  it  will  be  of   the  form  (concentrated 
over  a  ) : 


n„  n. 


(5)  In  L(vJ  3,   e,   6)   «  -y  In  9  +  ~  In  (1-9) 

-  f  In  i  {9      I      z]  +  "(1-6)      Z      ej} 

where  n.  is  defined  to  be  the  number  of  terms  in   I  .   From  a   computational 
point-of-view  the  likelihood  as  stated  generally  involves  k  3.'s,  9,  and 
n  e. !s.   In  effect,  we  are  asked  to  determine  the  e  's  through  _3  (from 
the  model)  and  to  place  each  one  in  the  "appropriate"  sum  (i.e.,  weight 
each  z2   by  either  6  or  (1-6)). 

To  clarify  this  latter  statement,  suppose  we  define  the  indicator 


variables  {z  }  by 


1  if  y  -  X!3  <  0 

i   — i 


(6)  z±   -  <  i=l,  . ..,  n 

0  if  y.  -  X!3  >  0 

where  X]  is  the  ith  row  of  X.   Then  (5)  can  be  rewritten  as 

(7)  In  L(vj3,  z,  0)  «  +  Z*ml    [z±   In  0  +  (l-z±)  In  (1-8)] 

"  2  ln  n  Ei«l  [Z1°  +  Cl-z±)(l-0)]e2, 


which  is  seen  to  involve  k  "main-"  parameters  (the  $'s)  and  n  "nuisance" 

parameters  (the  z  's),  along  with  G.   From  this  formulation  it  is  also 

apparent  that  the  parameter  n..  in  (5)  is  itself  not  -i  free  parameter,  but 

is  defined  as  n.  =  £,  ,  z , . 
1    i=l  i 

This  way  of  presenting  the  model  makes  it  reminiscent  of  the  "X- 
method"  for  switching  regressions  due  to  Quandt  (1972)  where  X  «  —  is 
known  in  advance.   In  fact,  the  likelihood  (5)  can  be  derived  within 
Quandt 's  framework  under  the  assumption  that  observations  are  equally 

likely  to  have  been  generated  by  a  negative  or  positive  half-normal  dis- 
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tribution  with  parameters  a  /6  or  a  /1-6,  respectively. 

The  problem  of  maximizing  (5)  is  not  trivial.  Moreover,  whether  . 

the  resulting  estimators  possess  all  the  usual  maximum  likelihood  (ML) 

properties  of  consistency,  asymptotic  normality,  and  efficiency  is  not 
obvious,  since  the  z  's  are  discrete  variables.   Leaving  this  matter  in 

abeyance  for  the  moment,  suppose  that  merely  for  computational  ease  we 

consider  the  "minimum  distance"  estimt  or  of  3.  (and  e)   d -tcrmined  by 

minimizing 

(8)  S(x|$,  e,  8)  -  e   Z      e2   +  (1-8)   £   e?, 

e±<0  c±>0 

which  is  seen  to  be  equivalent  to  maximizing  the  second  term  in  (5) . 
This  asymmetric  criterion  function  contains  ordinary  least  squares 
(6  -  1/2)  and  "full"  frontier  estimation  (6  =  0  or  0  =  1)  as  special  cases, 
and  is  equivalent  to  ML  estimation  in  these  instances.   Otherwise,  it 
allows  for  unequal  weights  to  be  given  to  disturbances  of  differing  sign, 


under  the  interpretation  that  a  larger  relative  weight  should  be  given 
to  less  variable  disturbances.  ' 

Some  discussion  of  the  above  specif icatious  seems  appropiiate  at 
this  point.   Though  discontinuous,  the  density  function  for  each  c.  is 
perfectly  lc^itimat  ?..   Moreover,  the  C,  's  are  homoscedastic  bit  with  non- 
zero mean.   Therefore,  ordinary  least  squares  (0L3)  applied  to  (1)  without 
regard  to  the  actual  value  of  6  will  in  general  produce  BLUE  and  consistent 
estimators  of  all  coefficients  except  the  intercept.   In  a  production 
function  context,  hypothesis  tests  concerning  any  element  of  3_  except  the 
'"•tercept  parameter  can  be  carried  out  as  usual,  based  or-  OLS.   indeed, 
if  the  sa.aple  is  a  cross-section  of  firms  in  a  particular  industry,  firms  ca: 
even  be  appropriately  ranked  for  relative  efficiency  bacec  on  their  OLS  esti- 

-ted  disturbances,  since  the  biased  (and  inconsistent)  intercept  estimatcr 
affects  them  all  similarly.   Use  of  the  criterion  function  (8)  or  the  "full' 
li^ej-ihood  function  (r)  is  aiaed  at  obtaining  consistent  and  asymptotically 
efficient  estimators  of  all  parameters,  including  the  intercept  term, 
through  an  "a]  prcpriate"  weighting  ol  observations. 

Interpretation  of  6  a^  a  nec'.>ure  cf  relative  v  ri:         observa- 
i.'ons  above  arc  below  the  point  £ .  -  0  follows  e£sil>  from  the  following 
scenario.  Again  within  the  indjscry  ,  roiuction  function  cunte..*  ,  if  the 
source  of  (random)  difference  jetween  firms  in  their  "pioduction"  cf  y 
for  given  x  derives  only  from  inherent  differences  in  the  availability  of 
and/or  ability  to  utilize  "best  practice"  technology,  <:he  appropriate 
error  distribution  should  be  one-sided  (cr  <  0) .   If  either  symmetric 
measurement  error  (in  y)  or  the  influence  of  a  symmetric  and  additive 
random  input  are  considered  as  well,  it  is  apparent  that  the  relative 


Several  of  these  points  have  been  made  independently  by  I-*  Schmidt 
(LilJ'/4)  in  a  recent  note. 


variability  in  y  will  differ  for  firms  above  and  below  the  point  e.  =  0. 
How  different  is  what  6  measures,  and  justifies  why,  for  example,  6  might 
be  set  equal  to  one  ("full"  frontier).   As  technological  differences 
dominate  the  aforementioned  symmetric  error  influences,  6  -*-  1.  Otherwise 
0  £  0  <  1,  reflecting  the  relative  importance  of  these  "error  components" 
in  determining  the  observed  distribution  of  firms. 

2.   Estimation  with  8  Known 

Since  in  (8)  the  index  sets  for  the  two  summations  are  endogenously 
determined,  it  would  appear  there  will  be  difficulty  in  locating  the 
minimum  of  S(v_|_$,  _£,  8)  even  if  0  is  known.   However,  it  is  shown  in 
Appendix  A  that  if  a  unique  global  minimum  exists  for  the  problem 

(9)  min  S(£,  8|j3,  e.)  -  0   Z   ej  +  (1-0)  Z      e^ 

{6,e>  e±<0    i       e±>0 

s.t.   ^  =  £  Sl  +  £ 
then  the  same  solution  is  obtained  for  the  problem 
(10) 


n    -.2 2   ,-K2 


min    S*(v_,  0|6+,  jf,  e+,  e~)  =0  S  (e~)2+ (1-0)  I     (e^) 

i R  .R  .r  .p  > 


<£  .1  »£  .£  > 


s.t.  v_  ■  X  _3+  -  X  0 ~  +  £+  -  e" , 


e+>  0,  3+>  0 


£~  1  °»  f  1° 


+   -  +   - 

with  e.  ,  £  being  n  *  1  vectors  of  signed  disturbances,  JS  ,  &     being  k  *  1 

vectors  of  signed  parameters,  and  where  also  X'X  is  assumed  to  be  non- 
singular.   This  latter  problem  is  formulated  as  a  quadratic  programming 

2 
(QP)  problem  in  2n  +  2k  unknowns. 

A  proof  of  the  inconsistency  of  the  3,  =  (3  -  6,  )  that  emerges  from 
a  solution  to  (10)  is  contained  in  Appendix  B.   (But  recall  that  only  one 
element  of  3.  is  inconsistent:   the  intercept  estimator.)   Therefore,  our 
formalization  of  Timmer's  suggestion  in  terms  of  this  particular  asymmetric 
weighting  of  residuals  leads  to  the  conclusion  that  nothing  is  to  be  gained 

from  the  effort,  at  least  insofar  as  bias  correction  in  the  intercept  is 

3 
concerned.   Consistent  estimators  of  the  intercept  are  available, 


2 

A  key  result  in  proving  that  (9)  and  (10)  pose  equivalent  problems  is 

to  show  min  (£.,  e~)  -  0  for  all  i  and  min  (3.,  37)  =  0  for  all  j.  This 

is  easily  argued  by  contradiction.   (See  Appendix  A).   Effectively,  then, 
there  are  n+k  unknowns  in  the  problem. 

Along  these  same  lines,  see  the  'nteresting  programming  applications 
to  unbiased  estimation  in  the  classical  regression  model  contained  in 
Sielken  and  Hartley  (1973) . 

3 
This  last  remarks  leads  us  to  point  out  again  that  OLS  is.  a  special 

case  of  (10),  when  9  =  1/2.   No  efficiency  gains  of  estimation  will  be 

realized  by  recognizing  the  signs  of  residuals  in  the  usual  OLS  criterion 

function  either,  whereas  it  might  seem  there  is  latitude  for  that  to 

be  the  case.   The  point  is  that  a  priori  information  on  the  signs  of 

residuals  is  not  involved,  but  merely  an  assignment  into  index  sets  which 

then  receive  the  same  weight  in  the  criterion  function.   However  obvious 

this  conclusion  might  appear,  we  provide  a  formal  demonstration  of  it 

in  Appendix  C. 
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however. 

One  technique  is  "corrected"  least  squares,  using  equations  (3) 
and  (4).   In  much  the  same  way  as  the  OLS  intercept  estimate  can  be 
"corrected"  for  bias  in  the  Cobb-Douglas  model  with  a  multiplicative  log- 
normal  disturbance,  in  this  problem  E(e  )  is  a  function  of  known  and 
estimable  parameters  (for  known  0)  and  such  correction  can  also  be  accom- 
plished.  Therefore,  if  V(e)  is  consistently  estimable,  say  by  a  statistic 

►A 

V(e) ,  (4)  can  be  manipulated  to  yield 

(11)  a2  =    V(e) [26(1-8) ] 

{     _    /9  -/l^6)2  } 

and,  using  (3) , 

(12)       g£)  -  /^  (/g-/i=e)i/2 

[it  -  (  /e  -  /T^Q)2] 

2 
These  equations  provide  consistent  estimators  for  a  and  E(e) ,  res- 

s\  /*     ~ 

pectively.   For  example,  if  6  =  1/3,  -(e)  -  -0.1362  /  V\e) .   Therefore, 

writing  the  OLS  intercept  estimator  as  b.  ,  the  "corrected"  OLS  intercept 
estimator  is  given  simply  by: 

(13)  3X  -  b1  -  sfe) 


an 


which  is  B  -  b±   +  0.1362  /V(e)  for  0  =  1/3.   Since  OLS  will  yield 
unbiased  estimator  for  V(e)  no  matter  what  the  true  6  may  be,  (13)  can 
be  implemented  (for  known  9)  and  may  produce  a  useful  estimator. 
Furthermore,  it  is  argued  in  Appendix  B  that  minimizing  the 
criterion  function 


(14)         S(£,  0|3,  e)  -  /6  E   e?  +  /l-0   E   e^ 

e .< 0  e  >0 

with  respect  to  3,  (for  known  0)  will  yield  consistent  estimates  of  all 
regression  parameters,  including  the  intercept.  Computationally  this  is 
as  difficult  a  problem  as  (10).  In  both  cases,  (8)  and  (14),  since  6  is 
presumed  known  at  this  point,  the  relevant  variance-covariance  matrix  is 
available  immediately  upon  recognizing  that  these  minimum  distance  esti- 
mators have  the  form  of  generalized  least  squares  estimators  given  a 
partitioning  of  the  sample. 

Finally ,  since  it  is  known  that  the  median  of  the  e.  's  is  zero,  the  minimum 

absolute  deviations  (MAD)  estimator  is  also  consistent  for  J3.   Likewise, 
if  we  concentrate  on  consistent  estimation  of  8,,  the  intercept,  then, 
for  the  same  reason,  the  sample  median  of  {y,  -  X' .b«},  where  X I .  is  the 
ith  row  of  X  excluding  the  first  element  and  b_  is  the  (k-1)  x  1  vector 
of  OLS  slope  estimates,  is  consistent  for  3,. 
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3.   0  Unknown. 

The  techniques  just  discussed  can  be  modified  to  encompass  estimation 
of  6.   In  order  to  present  those  modifications  we  must  first  derive  the 

"full"  ML  estimators  for  6  and  £. 

As  shown  in  Appendix  B,  differentiating  (5)  with  respect  to  G  and 

solving  the  resulting  first-order  equation  gives  as  the  ML  estimator,  0, 
(15)  8 


A  l   (y±   -  X^)2  +  i  Z   (y,  -  XII)2 

where,  for  ease  of  notation,  we  write  £  for   2   and  E  for   E  ,  and  Xj 

.  1    e±<0     2     e  >0 

is  the  ith  row  of  X.   Concentrating  the  likelihood  further  by  inserting 
(15)  into  (5),  we  find  that  the  ML  estimator  for  (3  minimizes 


ni,  i  (y±-*l&2  ,  n2 ,  i  **i-KB> 


(16)  Q  =  -i  In ± 4-  —  In 


n         nl         n  n2 


In  Appendix  B  it  is  shown  that  the  ML  estimator  based  on  (16)  is 
consistent  for  j3.   In  light  of  the  work  of  Quandt  on  the  switching 
regression  model,  this  represents  the  first  proof  of  consistency  for  such 
a  model,  albeit  such  a  special  one.   Previously,  only  Monte  Carlo  results 
were  available  that  suggested  consistency.   Apparently,  the  fact  that  the 
"mixture  parameter"  is  known  to  us,  that  all  other  (regression)  parameters 
are  identical  in  the  two  regimes,  and  that  the  disturbance  variances  in 
the  regimes  are  related  in  a  particular  way,  are  sufficient  to  identify  _3. 

Equation  (16)  is  clearly  nonlinear  in  j3  and  promises  to  be  more  com- 
plicated to  optimize  than  either  (8)  or  (14).   For  this  reason  we  focus 
our  attention  on  iterative  techniques  that  make  use  of  (15)  in  conjunction 
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with  the  methods  cf  the  previous  sec  Ion. 

One  "pseudo-ML"  method  uses  the  OLS  slope  estimators ,  Jb  ,  and  an 
iterated  intercept  estimate,  formed  by  inserting  a  consistent  estimate  of 
£..  along  with  Jb_0  into  (15),  calculating  0  '  ,  inserting  8    into  (13) 


(1) 

to  get  B-.    and  iterating.   The  final  estimates  are,  say,  0,  3„  ,  and  b«. 

/\  x  jL         i. 

"2  4 

o     is   always   available   through 

(17)  a2  -  \  [6  E   (y,   -  X!3)2  +   (1-6)   I   <y     -  X|$)2] 

with  estimates  of  6  and  j|  inserted.   By  an  argument  in  Appendix  B,  the 
iterative  procedure  is  expected  to  be  stable. 

Another  procedure,  which  re-estimates  all  elements  of  J3  at  each  step 
of  the  iteration  combines  (15)  with  (14).   To  begin,  we  insert  a  consistenc 

estimate  of  _8  into  (15),  calculating  6   .   This  value  is  then  used  in  (14) 

"(1) 
and  we  use  a  QP  procedure  to  minimize  (14)  with  respect  to  $_,  yielding  &_ 

The  process  is  repeated  until  convergence  is  achieved. 

Obviously  this  latter  method  is  the  more  costly.   Whether  it  has 

?ny  advantages  depends  on  comparative  small  sample  properties.   In  fact, 

^e  are   unable  to  derive  asymptotic  standard  errors  for  the  "full"  ML 

estimators  and  so  have  no  analytical  norms  of  comparison  even  in  very 

large  samples.   We  have  some  Monte  Carlo  evidence  on  these  questions  to 

report,  however,  which  is  the  province  of  the  next  section. 


4 
See  Appendix  B,  equation  (B.3). 
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4.   Monte  Carlo  Results. 

Study  1.   Since  the  primary  motivation  for  this  study  lies  in  pro- 
duction function  "frontier"  estimation,  the  first  of  two  Monte  Carlo 
investigations  focuses  on  such  an  application.   The  data  vere  taken  frorr. 
Aigner  and  Chu  (1968)  and  correspond  to  an  earlier  study  by  Hiidebrand 
and  Liu  of  the  primary  metals  industry. 

A  constant  term  (x  -  =  1,  i  =  1,  . . . ,  n)  and  two  independent  variables 
(x, 9  and  x  -)  were  included  in  the  model.   All  data  are  state  aggregates, 
with  x  -  corresponding  to  the  natural  logarithm  of  labor  and  x ,  _  corresponding. 
to  the  product  of  the  natural  logarithms  of  lagged  (one  year)  capital  and 

the  lagged  (one  year)  ratio  of  the  value  of  equipment  to  plant.   The 

5 
dependent  variable  is  the  value  added  for  output. 

Selecting  2  =  .98,  B2  =  .90,  and  $  =  .03  as  close  approximations 
to  the  parameter  estimates  found  in  the  Aigner-Chu  article,  dependent 
variable  observations  were  generated  for  various  drawings  of  £.  Letting 
c*  ~  NID(0,  .^245),  twenty-eight  obse  vations  on  £  were  generated  from 
(2)  for  9  =  1/4  and  G  =  1/3.   Two  differsnt  weighting  schei::3S  were  then 
used  in  posing  the  quadratic  programming  problem  (10) ,  assuming  G  is  known 
at  the  outset:   the  first  corresponds  to  the  inconsistent  minimum  distance- 
estimator  derived  from  (8),  whereas  the  second  uses  square  root  weights, 
as  in  (14),  and  produces  a  consistent  estimator  of  all  elements  of  3_, 
including  the  intercept.   (Both  yield  unbiased  estimates  of  slope  coeffi- 
cients.)  Lastly,  the  "full  frontier"  case  was  investigated  by  letting 


5 
This  model  corresponds  to  that  of  Aigner  and  Chu  (1968,  p.  835, 

eq.  (4.1))  which  was  used  to  estimate  a  "full  frontier." 

Note  that  a  selection  of  G  is  symmetric  on  either  side  of  G  «=  1/2, 
so  that  the  cases  0  -  2/3  and  0  =  3/4  are  covered  by  the  results  reported 
for  0  =  1/3  and  0  =  1/4,  respectively,  except  for  the  fact  that  the.  signs 
of  biases  are  reversed. 


6  ■=  1,  in  which  event  the  "negative"  truncated  normal  distribution  is  jsea 
to  generate  residuals. 

In  this  first  set  of  results,  therefore,  no  iterative  procedure  is 
considered.   We  take  8  as  known,  and  wish  to  compare  the  minimum  distance 
estimators  from  (8)  and  (14),  along  with  OLS  and  "corrected  OLS"  for  31 
(from  equation  (13)). 

Summary  results  on  100  replications  of  samples  with  the  three  dif- 
ferent values  of  8  are  reported  in  Tables  i  to  3.   Relative  performance  in 

estimating  the  intercept  is  probably  the  most  interesting  comparison, 

where  it  is  seen  that  (up  to  the  accuracy  available  from  100  replications) 

the  minimum  distance  estimator  (14)  has  smallest  (absolute)  bias  of  all  the 

estimators  and  smaller  root  mean  square  error  (RMSE)  than  the  other  QP  est 

mator.   Though  biased  downward,  the  OLS  and  "corrected  OLS"  estimators  for 

3,  compare  favorably  on  the  RMSE  criterion.   In  this  particular  example, 

the  correction  to  OLS  brings  the  estimated  intercept  v^ry  close  to  the 

consistent  estimator  from  (14).   Much  he  sama  conclusion  follows  from 

Tables  2  and  3.   Our  consistent  minimum  distance  estimator  (14)  shows  a  sli: 

improvement  over  OLS  in  RMSE.   (Recall,  all  estimators  are  unbiased  for 

32  and  B  ). 

As  a  check  on  the  accuracy  of  our  empirical  frequency  distributions 

based  on  100  replications,  we  also  ran  additional  samples,  up  to  a  total 


Tabic  ] 


Monte  Carlo  Results  for  3,  -    .98 

j. 

n  =  28,  100  Replications 
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Standard 

Mean 

Median 

Deviation  (RMSE) 

Case  1  (8  =  1/4) 

Equation  (S) 

1.11 

1.28 

2.52 

(2.52) 

Equation  (14) 

0.933 

0.965 

1.70 

(1.70) 

OLS 

0.705 

0.966 

1.79 

(1.8G) 

"corrected"  OLS, 

Equation  (13) 

0.930 

1.19 

1.79. 

(1.79) 

Case  2  (6  -  1/3) 

Equation  (8) 

1.1? 

1.36 

2.39 

(2.40) 

Equation  (14) 

0.947 

0.973 

1.64 

(1.64) 

OLS 

0.310 

0.90S 

1.67 

(1.67) 

"corrected"  OLS, 

Equation  (13) 

0.945 

1.043 

1.67 

(1.67) 

Case  3  (0  =  1) 

ML 

0.897 

0.891 

0.47S 

1  (0.485) 

Table  2 


Monte  Carlo  Results  for  $-  =  .90 
n  *  28,  100  Replications 
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Standard 

Mean 

Median 

Deviation  (RMSE) 

Case  1  (6  =  1/4) 

Equation  (8) 

0.912 

0.908 

0.468  (0.468) 

Equation  (14) 

0.840 

0.842 

0.297  (0.303) 

OLS 

0.844 

0.834 

0.314  (0.315) 

Case  2  (6  =  1/3) 

Equation  (8) 

0.942 

0.932 

0.449  (0.451) 

Equation  (14) 

0.844 

0.844 

0.286  (0.291) 

OLS 

0.846 

0.839 

0.293  (0.295) 

Case  3  (G  =  1) 

ML 

0.890 

0.902 

0.095  (0.0955) 

Table  3 


Monte  Carlo  Results  for  $~  =  .03 
n  -  28,  100  Replications 
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Standard 

Mean 

Median 

Deviation  (RMSE) 

Case  1  (6  =  1/4) 

Equation  (8) 

0.025 

0.009 

0.117  (0.117) 

Equation  (14) 

0.047 

0.042 

0.071  (0.073) 

OLS 

0.046 

0.041 

0.076  (0.078) 

Case  2  (6  =  1/3) 

Equation  (8) 

0.018 

0.010 

0.113  (0.114) 

Equation  (14) 

0.046 

0.043 

0.069  (0.0708) 

OLS 

0.045 

0.039 

0.070  (0.072) 

Case  3  (8  =  1) 

• 

ML 

0.033 

0.032 

0.023  (0.023?s 
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of  300  replications.   The  results  for  the  minimum  distance  estimator  (14) 
are  presented  in  Table  4,  where  it  is  seen  that  the  empirical  frequency 

distributions  are  settling  down  and  becoming  centered  around  the  true 

7 
values.    It  would  appear  that  this  estimator  may,  in  fact,  yield  unbiased 

estimators  of  all  parameters,  although  we  have  not  yet  been  able  to  prove 

that  conjecture. 

All  this  suggests  that  a  more  elaborate  set  of  Monte  Carlo  results 
might  be  welcome,  to  pinpoint  the  relative  precision  of  various  estimators 
and  to  consider  additional  data  alternatives.   However,  these  are  expensive 
to  obtain  for  the  QP  estimator  (14).   This  is  the  primary  reason  we  do  not 
attempt  to  implement  the  iterative  procedure  that  utilizes  (14)  and  (15). 

The  evidence  presented  so  far  is  not  really  conclusive.   But  it  does 
suggest  that,  for  the  case  of  9  known,  OLS  estimators  of  slope  coefficients 
in  conjunction  with  the  "corrected"  OLS  intercept  estimator  provide 
satisfactory  estimators  when  compared  to  the  consistent  alternative  (but 
expensive)  method  from  (14) . 


The  same  is  true  of  the  results  for  our  inconsistent  QP  estimator 
(8).   For  example,  with  300  replications  the  means  of  the  empirical  fre- 
quency distributions  for  its  estimates  of  3,,  3?,  and  S  are  1.14,  0.901, 
and  0.0293,  respectively. 
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In   the  next  section  of  Monte  Carlo   results,    therefore,   we  explore   further   the 
question  of   the  performance  of   "corrected  OLS"  estimation  of  3,    in   conjunc- 
tion with  estimation  of   8   through   an  iterative  procedure. 

Study  2.      In  this  experiment  we  abstract   from  the   regression   context 
(and  any   connection  with  real-world  data) ,    and  consider  the  small-sample 
behavior  of  one  of   the  "pseudo-ML"  estimators  mentioned  in  section  3, 
based  on  a  model  of   the  form  y     ey   +  e    .      In  particular,    the  estimation 
scheme  studied  uses  (15)  and  (13)   in  an  iteration   that  begins  by  inserting 
the  sample  median  of  the  y    's   into  (15).     The   resulting  estimate  of  6,   say 

B       ,    is  substituted  into   (13)    along  with  the  sample  mean  and  standard 

^(1) 

deviation  of  the  y^'s   to  yield  the  first-round  estimate  of  \i  ,   say  M        « 

The  process   is   repeated  until  convergence  is  obtained,  whence   the  estimate 
of  o^   is   calculated  through    (17)  with  the  final-round  values   for  6  and  V 
inserted. 

The  results  of  our  experiment  are  reported  in  Table  5.      200  replica- 
tions  were  used,  with  £     ~  N(0,   0.5),    ,j  =  1,    and  various  values  of   0. 
Sample  sizes  of  10,   20,   50,   and  100  are   included.      The  iteration  v/as 
stopped  when    Lhe  current  values   of   8  and  \i  differed  by  no  more   than  0.001 
and  0.005  from  their  previous  values,   respectively,   or  when   the  number  of 
iterations    reached  50,  which  happened   frequently.      In  situations  where 

the  iteration  limit  was   reached,    additional  samples  were  drawn  until   the 

g 
required  number  of  replications    (of  converging  cases)  was  obtained. 


Q 

In  every  case  when  the  iteration  limit  was  reached,  the  iteration 
was  in  a  loop.  To  some  extent  looping  can  be  controlled  by  selection  cf 

different  error  limits  for  0  and  \i%   but  not  if  a  particular  pre-seL  level 
of  accuracy  is  to  be  achieved.   In  any  event,  we  chose  to  base  our  main 
results  on  converging  samples  and  to  investigate  the  non-converging  cases 
separately. 

continued 
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Table  c 


Monte  Carlo  Results  for  Iterated  Estimators 
Based  on  (15)  and  (13)  in  the  Model  y  «  p  +  £ .. 

Entries  are  Means  over  200  Replications 

2 
(RMSE  in  parentheses),  ]i   =  1,  0     =  0.5. 


true  0 

\    sample 
JXsize 

10 

20 

50 

100 

0.5 

e 

.501   (.221) 
.989   (.363) 
.372   (.224) 

.493   (.180) 

1.01      (.264) 

.424   (.165) 

.508   ( 
.993   ( 
.456   < 

M26) 
M73) 
;.102) 

.501   (.0846) 
1.00     (.133) 
.492   (.0679) 

0.4 

e 

> 

.452    (.220) 
.929    (.358) 
.368    (.223) 

.414    (.163) 
.973   (.285) 
.431   (.169) 

.407   \ 

1.00     ( 

.465    ( 

Mil) 
M75) 
M07) 

.402   (.0812) 
1.00      (.128^ 
.486   (.0762) 

0.3 

8 

V2 

0 

.367   (.230) 
.940   (.402) 
.390   (.252) 

.326    (.163) 
.926    (.293) 
.453   (.176) 

.336    < 
.971    < 
.480   < 

[-117) 

M94) 

[-108) 

.311   (.0712) 
.984   (.123) 
.488   (.086) 

0.2 

e 

^2 
a 

.293    (.216) 
.857    (.435) 
.452    (.271) 

.271    (.168) 
.927    (.305) 
.48cr.    (.205) 

.234    < 
.945    < 
.508   < 

:.0938) 
[.189) 

[.142) 

.231   (.0689) 
.959    (.153) 
.517    (.103) 

0.1 

e 

a2 

.228   (.?27) 
.689    (.682) 
.653   (.478) 

.161    (.126) 
.8^8    (.373) 
.593    (.317) 

.139    < 
.883   « 
.592    1 

[.0760) 

[.272) 

[.239) 

.129    (.0558) 
.907    (.209) 
.585    (.185) 
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As  we  would  expect,  the  results  jhow  that  u  is  estimated  with  smaller 
bias  and  better  precision  as  n  increases  for  given  0  and  as  6  approaches  0.5 
for  given  n.   (Recall,  as  8  moves  away  from  0.5  the  distribution  of  £.  is 
becoming  more  dispersed.)   Similar  behavior  for  our  estimator  of  8  is 

apparent  if  we  use  relative  precision  as  an  indicator  of  efficiency  rather 

2 
than  absolute  precision.  0"  seems  to  be  generally  underestimated  (the 

exception  is  8  =  0.1)  while  8  is  generally  overestimated  for  any  sample  size 


Footnote  10  continued 

There  seems  to  be  little  difference  from  the  converging  cases 
either  when  the  last  value  is  used  as  the  estimate  or  when  w_  tak°. 
the  average  valu^  over  the  loop.   For  example,  for  r>  =  20,  8  -   0.5,  the 

averaged  8's  over  non-converging  samples  are  .487  (.184)  ?nd  .436  (.182) 
for  the  last  value  and  average  over  the  loop,  respectively,  as  compared 
to  .493  (.180)  as  reported  in  Table  5. 

Below  is  a  table  of  experimental  relative  frequencies  of  ocuuirence 
of  r.cn-converging  samples  based  on  the  first  100  replications  that  went 
into  Table  5.   There  does  seem  to  be  a  tendency  for  the  proportion  of  non- 


8 

0.5 
0.4 
0.3 
0  =  2 
0.1 


n 

10 

20 

50 

.34 

.36 

.39 

.32 

.49 

.46 

.37 

.38 

.39 

.32 

.38 

.31 

.26 

.28 

.30 

100 


.46 
.41 
.33 
.29 
.17 


converging  cases  to  fall  off  in  a  southeasterly  direction,  but  there  are 
many  exceptions  to  that  general  observation. 
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5.   Conclusions. 

After  a  rather  lengthy  paper,  it  is  most  appropriate  to  keep  this 
final  section  brief.   To  do  so,  we  summarize  our  results  as  follows. 

Insofar  as  the  practical  matters  of  specifying  and  implementing  an 
econometric  framework  within  which  something  other  than  a  "full"  frontier 
function  can  be  estimated  are  concerned,  we  have  presented  such  a  framework 
and  have  evaluated  (though  incompletely)  the  properties  of  a  variety  of 
procedures  that  might  be  used.   In  effect,  we  have  produced  a  "theory  of 
outliers"  for  the  frontier  function.  The  upshot  of  our  efforts >  including 
our  own  evaluation  of  the  findings,  may  be  criticized  as  being  "much  ado 
about  nothing",  in  that  the  substance  .of  our  recommendations  Involves 
nothing  more  than  an  "adjustment"  of  the  regression  intercept  and,  after 
all,  who  is  interested  in  its  value  anyway?  Two  responses  are  appropriate 
here:   First,  for  forecasting  purposes  the  intercept  is  important.   Second, 
along  with  the  intercept  adjustment  comes  explicit  "placement"  of  the 
function,  which,  of  course,  was  the  goal  cf  this  exercise  to  begin  with. 

A  more  attractive  criticism  would  question  our  approach  to  the  problem 
as  being  obtuse.   For,  after  all,  why  canst  one  begin  with  an  explicit 
statement  of  the  error  process  mentioned  after  equation  (8)?   In  this 
"explicit"  formulation  symmetric  measurement  error  might  take  the  traditional 
normal  form,  whereas  technological  differences  among  firms  may  be  represented 
by  (e.g.)  the  negative  truncated  normal.   Our  preliminary  study  of  the 
resulting  likelihood  function  shows  it  is  of  the  same  form  as  the  likelihood 
function  used  by  Amemiya  (1973)  in  his  work  on  the  Tobit  model,  but  with 
differences  in  how  parameters  enter.   In  this  context  it  is  also  apparent 
that  an  equivalent  to  our  0  can  be  estimated  along  with  other  model  para- 
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meters,  and  is  readily  interpreted  as  an  indicator  of  relative  variability. 

The  admission  that  there  is  a  more  direct  means  by  which  to  capture 
the  behavioral  underpinnings  of  our  problem  is  not  meant  to  detract  either 
from  the  interesting  theoretical  results  obtained  in  Appendix  B  or  the 

possibility  that  the  estimators  discussed  here  may  still  be  preferable  to 

those  derived  from  this  alternative  statistical  model. 
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Appendix  A'   Equivalence  of  equations  (9)  and  (10)« 

To  prove  the  equivalence  of  (10)  with  the  original  problem  (9)  requires 
showing  that  at  a  solution  point  for  (10),  (a)  min  (e  ,  £.)  =  0  for  all  i 
and  (b)  min  (3.,  3.)  ■  0  for  all  j.  (Actually,  the  technique  of  replacing 
a  variable  which  is  to  be  unrestricted  in  sign  by  the  difference  of  two 
nonnegative  variables  is  a  standard  operational  procedure  in  mathematical 
programming.  ) 

To  demonstrate  (a),  suppose  min  (e. ,  £.)  =  a  >  0  for  some  i  =  id  at 

a  solution  point  S*   =  min  {8  E.(-eT)2  +  (1-©)  £.,  (et)2}.   Since  S*  is 
r      opt  i  i  i   i 

strictly  quasi-convex  in  the  e-variables  and  the  constraint  set  is  linear, 

such  a  solution  exists  and  is  unique.   Then,  consider  a  "new"  solution  with 

+  + 

coordinates  (e  -  a)  and  (£  -  a)  replacing  £  and  £  ,  respectively.   The 
mm  mm 

constraints  are  obviously  satisfied  for  this  solution  and  the  value  of  the 

objective  function  will  be  smaller  than  S*   ,  which  contradicts  the  original 

opt' 

supposition.   Hence  min  (e  ,  £  )  -  0  for  all  i. 

As  to  the  second  proposition,  lrt  min  (3,,  3.)  -  Y  >   0  f°r  some  j  =  Z* 

+      -      + 

again  at  a  solution  point  S*   ,  and  consider  replacing  3^  and  $£  by  (6^  -  Y) 

and  (3^  -  y) »  respectively.   Clearly  the  constraints  are  satisfied  for 

these  modified  coordinates,  but  the  value  of  the  objective  function  is  left 

+  — 

unchanged.   To  see  this,  at  an  index  i  =  m  suppose  £  =  0  (hence  £  >  0) . 

m  m  — 

Then,  y  =  X'3  -  X'3~  -  £~,  where  X'  is  the  rn—  row  of  X,  and  (£~)   - 
mm-     m—     m        m  m 

(y  -  X'3+  +  X'3~)2.   Replacing  $t  and  fp   by  (fit  -  y)  and  (37  -  Y)  leavns 
m    m —     m~  -c      <c      >c  k. 

-   2 
(£  )   unaffected.   Since  S*  will  be  strictly  quasi-convex  in  the  3-variables 

for  X'X  non-singular,  this  conclusion  contradicts  the  uniqueness  of  a 

solution  S*   .   Since  3*.  3"  >  0  we  conclude  that  min  (3*,  37)  ■  0  for  all 
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Appendix  B:   Estimation  of  the  Parameters  of  a  Discontinuous  Density  Function 

In  this  Appendix  we  develop  the  main  theoretical  results  used  in  sec- 
tions 2  and  3  of  the  paper.   The  argument  is  couched  in  terms  of  a 
simplified  version  of  the  model  (1) ,  namely  when  there  is  only  a  column 
of  ones  in  X  and  the  goal  is  estimation  of  the  mean  of  y . ,  say  u.   Extension 
to  the  regression  case  is  apparent. 

Suppose  a  random  variable  y  has  the  density 

"%  80        2 

__   exp  [ f   (y  -  y  )*]  for  -»  <  y  <  VU 


(B.l) 


0       fcW0 


^O       ^O        2 

exp  [ -x   (y  -  un)  ]  for  Un  <  y  <  «>. 


/2¥a0       2aQ 


2  ^    i-Q'  J  —  ^o 


We  want  to  consider  the  estimation  of  ]iAf  G_,  and  6_  on  the  basis  of  n 

u   u      u 

independent  observations  {y  ,  i  =  1,2,..,,  n}  on  y. 

The  log  likelihood  function  of  {y  ,  i  »  1,2,  . , . ,  n}  is 


n,        n. 


i         o 
(B.2)         In  L  =  -  ~  ln  27T  +  -~  In  8  +  -~  In  (1-6)  -  ~   In  a 


-\^(yrp)2-^f  >:(yrM)2 
2a  1  x      2a  2  x 


where  E  means  the  summation  over  {± j y  j<  u},  and  £  means  the  summation  over 

1  i  2 

{i|y  >  u},and  n1  and  n_  are  the  numbers  of  terms  in  E  and  I  respectively. 

2  12 

Maximizing  (B.2)  for  a  ,  we  obtain 

(B.3)         a2  =  ~  [G  £(y.-u)2  +  (1-G)  Ky.-u)2). 

1  2 
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Inserting  (B.3)  into  (B.2),  we  obtain  the  concentrated  Tog  likelihood  func- 
tion in  0  and  y: 

(B.4)  In  L  =  -  §  In  {^  [8  I(y.-y)2  +  (1-6)  X(y  -y)2]} 

1  2 

n        n 

+  y  In  G  +  y  In  (1-9), 

corresponding  to  (5)  in  the  text. 

Maximizing  (B.4)  for  6,  we  obtain 

2 

(B.5)         6  = 


-1        2-1       2 
n9  £(y,-y)  +  n   £(y  -y) 

2  X  1 


Finally,  inserting  (B.5)  into  (B.4),  we  obtain  the  concentrated  log  likeli- 
hood function  in  y: 

Z(y  -y)2         Ky,-y)2 

nl    1  i       n2    2  x 
(E.6)         -In  L.  =  ~  In  - +  ~  In  ' 


2    2       nx       2       n2 

a. 

Thus,  the  max- mum  likelihood  estimate   of  y,  y,  is  that  value  of  y  which 

minimizes 

I(y-p)2        Z(y  -y)2 

1    1  2    2 

(B.  7)  Q  =  —  In +  ~ln 

n       n1      n       n 

We  now  show  the  consistency  of  y.   We  have 
(B.8)  plim  Q  =  (l-P)-ln  E[(y  -p)2|y  <  y]  +  P«ln  EKy^-y)2^  >  y] 


whe 


re  P  =  P(y.  >  y) .   Let  us  assume  y  >  p  .   Then  we  have 
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(B.9) 


00 


/l^G 


p  *"    / 


0 


0         /2tto 


1-0 
exp{ -'-  [z  -   (yn  -  y)]"}dz} 


^a. 


0 


(B.10) 


P.E[(y,-p)"|y,     >    \i 


00    /1-6       9  1-0 

'  2     exp  {-  — ^  [z  -   Cp0  -y)]2}dz 
2a. 


0        /Jrra. 


0 


/2Tr(l-6^)" 


1-0 
exp    [-  — -^  (n     -  p)2] 


2a 


2 
0 


/27T  X    ^O 


J  exp    (--z   )dz 


/i-e„ 

o    ,         V 

—  (p-V 


and 


(B.ll)  P-E[(yi-y)2|yi  <   ii] 


Vp       /e 

-    / 

—CO 


eo 


exp   { ~-  [z  -    (u     -  y)*}dz 

27[0«  2a  f 


0 


0 


o      /T-o 


+     / 


0     2 


:<~8 


0 


,2- 


P0-y      •  2Tra0 


7—  z     exp    {-  —z   [z  -    (y     -  p)]"}d: 

2a  ° 

0 


2a. 


a2 
/2^eT       U  2      0O  J 


2  2a0 


/27<i-o0) 


ov,  •-  \0 


°o  ^n 

— ==  (M0  ~   P)    exp    [ -"-   ()j     -   y)Z] 

/2M1^)       u  2a2       ° 


/l^O 


0 


L_    r      °       r     (u  -  >>21  0 

'>-.r      ■"•    un  u 


(M-M0) 


/2ir 


0 


exp    (-  —  z   )dz 
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From  (B.8),  (B.9),  (B.10),  and  (B.ll)  we  obtain 


(B.12)         [^  plim  Q]     -  -~ "  <2<  SK   ~  /l1^) 


+  /T^Q~  [log  (i-e0)  -  log  e0]} 


But   the  right-hand  side  of    (B.12)    can  be  written  as 

2/1^ 

(B.13) (x  -  In  x-1) 

/27ra 


where  x  ■  /  6^./  (l-0_)  .   Since  we  have 
u     u 


(B.14)         d(x  -  In  x-1)  >  as     > 

dx       <  <      = 

expression  (B.13)  is  nonnegative.   Next,  consider  the  case  y <  un.   Since 

—   0 

(B.15)        plim  Q(60,ii)  ■  Plim  Q(l  -  eQ,  2uQ  -  y) , 

we  have 

(B.16)         [A  plim  Q]     -  -  -^r-  {2(  /l^  -  /y 

+  /0^  [In  eQ  -  In  (i-e0)]}. 

But  the  right-hand  side  of  (B.16)  is  nonpositive  for  the  same  reason  that 
(B.12)  is  nonnegative.   Tims  we  have  shown  that  plim  Q  attains  a  minimum 
at  y  -  \i  . 
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It  is  eary   to  show  from  (B.3)  an'  (B.5)  that  the  maximum  likelihood 

2 
estimators  of  c     and  0  are  consistent  if  a  consistent  estimator  of  y  is 

inserted   for  \i   in  the  right-hand  sides  of  (B.3)  and  (B.5). 

Since  the  minimum  of  plim  Q  is  attained  at  the  point  where  the  deriva- 
tive does  not  vanish,  the  asymptotic  distribution  of  the  maximum  likeli- 
hood estimator  cannot  be  obtained  by  the  usual  method  using  a  Taylor  expan- 
sion.   Chernoff  and  Rubin  (1956)  show  that  n(u  -  ufi)  has  a  proper  limit 

distribution  but  do  not  obtain  it  in  a  closed  form.  Thus,  we  are  unable 

a. 
to  report  the  proper  formulas  for  the  asymptotic  standard  errors  of  y, 

2 

G,  and  G  . 

This  proof  of  the  consistency  of  .the  maximum  likelihood  estimator 
carries  through  directly  to  the  regression  case  where  \i     is  replaced  with 
X'.jLj  in  (B.l),  but  we  do  not  reproduce  that  proof  here. 
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The  foregoing  considerations  sugg  st  two  potentially  useful  iterative 
procedures  for  determining  "pseudo~ML"  estimates  of  parameters. 


Method  1.   "Corrected  least  squares". 


Thi 


s  method  utilizes  the  first  two  sample  moments  of  the  {y. }, 


namely  y  ■  —  £.  ,  y,  and  s  =  — r  E.  ,  (y.-y)  .  We  know  from  (3)  and  (4) 
J   J       n  i=l  'i         n-1  i=l  wi  J  v  ' 


that 


(B.17)         E(y)  -  m  +  -2-  (^  -/l^) 

/2i   /"e /i7e 


and 


2 


<»•">  '^-^d-^;^). 


These  two  equations  suggest  the  following  estimator  for  u 

s(  /e  -  /i^e> 


(B.19)  u  =  y  - 


[7T  -  (  /e  -   /i-q)2j1/2 


which  is  identical  to  (13)  in  the  text . 

The  suggested  iteration  begins  by  inserting  the  sample  median ,  which 
is  a  consistent  estimator  of  u,  into  the  right-hand  side  of  (B.5).   Second, 
insert  the  estimate  of  6  thus  obtained  into  the  right-hand  side  of  (B.19) 
to  obtain  the  second-round  estimate  of  u.   Repeat  the  iteration  until  it 

converges.   Finally,  insert  the  converged  values  of  u  and  6  into  the  right- 

2 
hand  side  of  (B.3)  to  obtain  an  estimate  of  0    . 

Using  (B.9),  (B.10),  and  (B.ll),  we  can  show 
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(B.20)        [-^plimO] 


and 


y-p0     ct0/2tt00 


(B.21)         fePl^^i     --2 2 ~ 

dy      e=e0  2  /He0(i-e0) 

Since  the  product  of  (B.20)  and  (B.21)  is  less  than  1  in  absolute  value, 
the  suggested  iteration  using  (B.5)  and  (B.19)  is  asymptotically  stable 
in  the  neighborhood  of  the  true  values. 

Method  2.   "Minimum  Distance". 

The  second  method  is  based  on  the  minimum  distance  estimator  of  JJ 
that  minimizes 

(B.22)         S  -  i  [  /9£(y  ~u)2  +  /T=6  I(y.-U)2] 

1  2 

for  a  given  value  of  9.   The  suggested  iteration  is  as  follows:   First, 

substitute  the.  sample  median  for  y  in  the  right-hand  si  le  of  (B.5).   Second, 

insert  the  estimate  of  8  thus  obtained  into  the  right-hand  side  of  (B.22) 

and  minimize  S  with  respect  to  u  to  obtain  the  second-round  estimate  of  u. 

Repeat  the  iteration  until  it  converges.   S  is  continuous  in  u  and  between 

the  adjacent  values  of  y  it  is  smooth.   Therefore,  it  is  easy  to  search 

for  a  minimum  in  the  neighborhood  of  the  value  of  u  obtained  in  the  preceding 

round  of  the  iteration. 

/2tt  ~ 
Let  S  be  the  value  of  S  evaluated  at  6  =  8«  and  define  R  =  — r—  S_. 

°0 
Then  we  have 
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(B.23)        plim  R  =/e^  (l-P)E[(y  -y)2|y.  <  MJ  + /T^  PE[  (y^y)2]  y±  >  y] 
Assuming  y  >  y  and  using  (B.9),  (B.10),  and  (B.ll),  we  have 


(B.2A) 


1  2 

plim  R  =   -w  exp    [-  -r   (I-6ft)w    ] 


uu 

+    [ 1 +  /ITe7w2]  /         exp    (-  i  z2)dz  +  2w 

0  /r^r0„ 


+    /25     +    ^     2          2/%             "%                    .      ^O     2, 
+  H ^ w —  w  +  w  exp    [ = —  w    ] 


2/FQ 


/i^         /T^ 


/i-e   w 

/ —        12  12 

+    /6      [t—s-  +  w   ]  /  exp   (-  t  z  )dz 

0    x-eo  0  2 


Therefore, 


(B.25) 


^  plim  R 


-  0 


w=0 


and 


(B.26) 


3w" 


plim  R 


w=0 


■  /27T   (  /IT  +  •  1-6)    >  0 


Therefore,    the  minimum  distance   estimator  of   y   that  minimizes   S^  is   consistent. 

It   is   interesting   to  note   that   a  similar  argument   applied   to  the 
distance   function 


(B.27) 


S  =  6  Z(y  -y)2  +   (1-0)   E(y .-y)2, 
1  2 


which  is  the  quadratic  term  in  the  concentrated  log-likelihood  (B.4), 
results  in  the  conclusion  that  the  minimum  distance  estimator  of  y  which 


minimizes  (B.27)  at  0  =  0  is  inconsistent. 
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These  two  methods  are  easily  adapted  to  the  regression  framework 
(although  the  first  is  by  far  the  easiest  to  implement).   When  we  take  as 
our  goal  consistent  estimation  of  the  intercept,  using  the  OLS  estimators 
for  slope  coefficients,  then,  denoting  by  _b_9  the  (k-1)  x  1  vector  of  OLS 
slope  estimates  and  by  Xl .    the  corresponding  ith  row  of  X  excluding  the 
first  element,  one  initial  consistent  estimate  of  B,  is  just  the  sample 
median  of  {y .  -  XL  .]?_*}  •*     This  initial  estimate  can  be  improved  upon 
iteratively  along  with  estimation  of  9  —  through  (B.19)  and  (B.5)  — 
or  the  complete  B-vector  can  be  estimated  iteratively  along  with  9  through 
(B.22)  and  (B.5). 


Another  consistent  estimate  comes  from  applying  the  MAD  estimator 
to  estimate  3,  which  yields  an  unbiased  estimate  of  the  median  residual. 
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Appendix  C:   Proof  that  OLS  is  optiml  for  the  classical  regression  model 
when  residuals  are  differentiated  according  to  sign. 


When  signs  of  residuals  are  recognized,  the  least  squares  problem 
is  written 

2        2 
(C.l)  min     E  E  +  T.     c 

{£,£}  £^0      ei>0 

s.t.  x  =  X  I  +  £• 

As  we  have  shown  in  Appendix  A,  the  above  can  be  equivalent ly  written 


(C.2)  min+    E  ?  (-E;)2  *  E £   (e+)2 

{£,£,£  } 


s.t.  y_=XB  +  e  -  £ 


e*e~  >   0 


+   _ 
min  (£.,  e.)  =  u  for  all  i, 
11 

the  latter  condition  being  non-binding  (i.e.  implied  by  the  problem 

formulation) . 

-f  - 

By  substituting  the  model  identity  £     =  _y_  -  X  3_  +  £,  we  have 


(C.3)  min_  9*«  Z.^  (-e~)2  +  Z^    (y±  -  X'3  +  e~)2 

(3»E  ) 


s.t.  c  >  0  all  i 
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Any  solution  to  this  problem  must  mret  the  Kuhn-Tucker  conditions 


(C.Aa) 


9£ 
3e" 


£I=  K 


>  0 


i-1,  ...  n 


(C.4b) 


n 


'i=l 


3S* 


3e 


e.  =  c. 
i    1 


Gi 


(C.4c) 


(C.4d) 


9S* 
36j 


I   k 
J-l 


~   >  0 

a.  =  e.  - 


3S* 
93, 


*J«*J 


6.  =  0, 


J-l,  ...  k 


where  e.  (i=l,  ...  n)  and  3. (j-l,  . . .  k)  are  solution  values  for  e.  and  3. 
l     '  j  i      j 

respectively. 

OLS  will  be  optimal  if,  when  £_  =  (X1  X)~  X{y_,  e.  =  My_  with 
M  *  I  -  X(X'X)   X' ,  and  the  separation  of  e  into  e  and  £  is  ex  post,  the 
Kuhn-Tucker  conditions  are  satisfied. 

Developing  the  needed  expressions  for  (C.Aa)  through  (C.Ad)  for  the 
criterion  function  (C. 3) , 


(C.5a) 


3S* 

9e" 


=  2T. 


n 


a  Cyt  -  xi  B  +  e)  -  2^  h   , 


1*1,  . . .  n 


and 


(C.5b) 


9s; 

93 


"  tt±iBl  (y4  -  Xj  B  +  c")  xtj  + 


+  2Ei=l  ("yi+^i  £+ej)  xij' 


j«l,  ...  k. 
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Now,  (C.4a)  holds  for  the  OLS  solution  because  j.'  £  s  ^  by  construction, 
where  \    is  the  n  xl  vector  of  unities.   To  see  this,  when  e  £  0,  £  = 
-  e  >  0  and  c,  -  0.   (C.5a)  effectively  separates  the  OLS  calculated 
residuals  as  to  sign  and  then  sums  over  all  of  them,  first  assigning  the 
proper  negative  sign  to  otherwise  positive  e,  fs.   Since,  from  (C.4a), 
d$'fd£~,   =  0  for  all  i  when  evaluated  at  e~  =  e^ ,  (C.4b)  follows  trivially. 

The  analysis  of  (C.Ac)  and  (C.4d)  follows  in  much  the  same  way.   Since 
for  the  OLS  residuals,  e_fX.  =  0,  j=l,  . . .  k,  where  X  is  the  j   column  of 
X,  (C.5b)  is  zero  for  all  j  when  evaluated  at  the  OLS  vectors  £  and  e_  . 
(C.4d)  is  therefore  trivially  true. 

Finally,  the  qualitative  information  embodied  in  recognition  of 
residual  signs  when  signal  residuals  are  weighted  equally  has  no  influence 
on  the  ML  solution  either.   Referring  to  text  equation  (5),  we  see  that 
when  8  =  1/2  the  ML  criterion  function  essentially  reduces  to  (C.l). 

Again,  we  caution  the  reader  to  the  fact  that  in  our  model  residual 
signs  are  not  used  as  a  priori  knowledge.   Obviously,  in  that  instance 
OLS  would  be  optimal  only  by  coincidence;  when  the  (unconstrained)  OLS 
solution  happened  to  satisfy  all  the  given  sign  constraints, 
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